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Gedae  is  an  integrated  application  development 
environment.  It  has  been  under  development 
since  1987  -  though  the  concepts  involved  are 
rooted  in  much  earlier  work  done  in  the  areas  of 
data  flow  and  hardware  simulation.  In  Gedae 
we  have  developed  a  language  for  describing  an 
architecture-independent  functional 
specification,  a  virtual  machine  on  which  the 
application  runs,  and  transformations  that  create 
an  efficient  implementation  of  the  application 
that  runs  on  the  virtual  machine.  In  this  paper  we 
discuss  three  topics  -  the  language,  the  virtual 
machine  and  the  transformations. 

The  language  was  developed  with  two 
requirements  -  any  functionality  must  be  easily 
expressible,  and  the  language  must  be 
transformable  into  an  efficient  implementation 
on  the  virtual  machine.  The  Gedae  Language 
consists  of  both  the  Gedae  Primitive  Language 
and  the  Gedae  Graph  Language.  Much  of  the 
expressiveness  is  in  the  primitive  description 
language.  The  language  has  over  50  expression 
features  to  define  the  behavior  of  functional 
ports.  Port  data  flow  requirements  can  be 
specified  either  prior  to  runtime  (static)  or  at 
runtime  (dynamic).  Ports  can  add  segment 
boundary  markers  on  the  data  flow  streams, 
thereby  breaking  the  stream  into  independent 
data  sets.  Exclusive  families  of  ports  can  send 
data  down  one  branch  or  another  to  implement 
mode  changes  while  maintaining  coherent  state 
vectors  used  by  all  the  modes.  Primitives  can 
maintain  their  own  local  state  variables  and 
provide  methods  for  execution,  startup, 
termination  and  handling  the  beginning  and 
ending  of  segment  processing.  The  Gedae 
Graph  Language  allows  the  hierarchical 
development  of  graphs  consisting  of  primitives, 
parameters  and  other  Gedae  graphs.  The  graph 
language  can  describe  families  of  these  entities 
to  allow  parameterized  expression  of 
parallelism.  The  resulting  language  permits 
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direct  expression  of  signal  and  data  processing 
algorithms,  distribution  for  providing  load 
balancing  and  fault  tolerance,  and  application  (or 
software,  or  mode)  control. 

To  achieve  efficiency,  the  language  and  virtual 
machine  were  codesigned.  The  virtual  machine 
contains  a  runtime  kernel  that  executes 
components  generated  by  the  transformations. 
For  example,  the  static  scheduler  executes 
predetermined  execution  sequences  based  on 
static  data  flow  ports,  and  the  dynamic  scheduler 
executes  groups  of  static  schedules  that  interface 
through  dynamic  data  flow  ports.  The  virtual 
machine  manages  the  segment  processing  and 
controls  the  efficient  and  timely  transfer  of 
distributed  state  vectors  between  processors. 

The  virtual  machine  also  allows  for  vendor 
specific  optimizations  of  processing,  such  as, 
setting  data  transfer  parameters.  A  thin  layer 
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over  the  vendor-provide  vector  processing 
libraries  allows  primitives  to  execute  efficiently. 

One  of  the  unique  features  of  Gedae  is  the 
visibility  of  the  implementation  and  the 
execution  it  provides.  This  visibility  is  possible 
because  the  language,  the  transformations  and 
the  virtual  machine  are  all  part  of  Gedae.  The 
visibility  allows  the  generation  of  detailed 
execution  timelines  and  the  symbolic  viewing  of 
any  memory  in  the  system.  Primitive  execution, 
queue  state  and  data  transfers  between 
processors  can  be  dynamically  viewed  when  the 
application  is  running. 

The  transformations  are  the  central  part  of 
Gedae  and  make  possible  the  efficient  execution 
of  the  application  expressed  in  the  Gedae 
Language  on  the  Gedae  Virtual  Machine.  The 
transformations  are  fully  automated  but  can  be 
guided  by  user  supplied  implementation 
parameters  to  control  distribution,  strip  mining, 
data  transfers,  scheduling  priorities  (both  static 
and  dynamic),  queue  policies  and  memory 
management.  Some  of  the  transformations 
directly  modify  the  graph  into  an  equivalent 
graph  to  implement  a  user  entered 
implementation  decision.  For  the  user  to 
distribute  a  graph,  the  user  specifies  a 
partitioning  of  the  graph  and  a  mapping  of  the 
graph  to  individual  processors.  Gedae  modifies 
the  graph  by  inserting  send  and  receive 
primitives  that  run  on  the  separate  processors 
and  maintain  the  data  flow  and  connectivity  of 
the  graph.  The  user  does  not  have  to  modify  the 
graph  to  achieve  these  results. 


For  example,  the  following  graph  has  dynamic 
queues  and  is  distributed  to  four  processors  by 
the  user: 


It  is  transformed  into  a  new  graph,  as  seen 
below,  with  send  and  receive  boxes  inserted  to 
manage  communications  and  dynamic  queues 
also  inserted  to  handle  dynamic  data  flow 
boundaries.  Other  transformations  include 
modifying  the  graph  to  implement  strip  mining 
of  vectors,  adding  primitives  to  implement 
delay,  and  adding  primitives  to  allow 
communication  of  the  graph  to  the  host  program 
or  to  other  Gedae  applications.  Data  structures 
are  also  created  to  implement  segmentation, 
mode  control  and  distributed  state  coherency. 

A  sonar  signal  processing  graph  will  be  used  to 
demonstrate  how  a  graph  is  transformed  into  an 
implementation.  It  will  be  shown  how  the 
transformations  can  be  used  to  modify  the  graph 
execution  without  changing  the  Gedae  Language 
expression  of  the  graph.  The  resulting 
implementations  will  be  contrasted  with  how  the 
same  implementations  would  be  achieved  using 
traditional  programming  techniques. 
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What  is  Gedae? 


Gedae 


Gedae  is  a  block  diagram  language  ... 


Express  signal  and  data  processing  algorithms, 
parallelism,  load  balancing,  fault  tolerance  and  mode 
control 
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Gedae 

..that  Gedae  transforms  under  user  control... 


User  can  set  optimization  parameters  that  are 
independent  of  the  graph  to  guide  transformation 
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I  Gedae 

▼  ...to  operate  efficiently  on  a  virtual  machine. 


Complete  systems  can  be  developed  independent  of 
the  target  system  without  losing  runtime  efficiency 
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Gedae  Language 

•  Gedae  provides  application  information  through 
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Gedae 


-  modules  with  well-defined  behavior 

-  ports  with  well-defined  characteristics 

-  and  manifest  connectivity  with  explicit 
sequential  and  parallel  execution  paths 

•  This  information  is  implicit  in  most  languages 

•  Gedae  makes  the  information  explicit 


-  over  50  different  information  expression 


Information  provided  by  language  allows  Gedae  to  analyze 

and  efficiently  implement  algorithms 
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Gedae  Transformations 


The  block  diagram  is  transformed 
using  over  100  algorithms. 

The  transformations  establish  the: 

-  Order  of  execution 

-  Queue  sizes 

-  Granularities 

-  Memory  layout 

-  Dynamic  schedule  parameters 

-  Data  transfer  types  and 
parameters 

-  Mode  control 
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The  Gedae  transformations  build  a  detailed  model  of  the  deployed 
application.  Gedae  uses  that  information  to  provide  visibility 
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Gedae  Virtual  Machine  (VM)  Gedae 


Gedae  provides  the  following 
components: 

-  Command  handler 

-  Dynamic  scheduler 

-  Segmentation  Support 

-  Primitive  Support 

-  Visibility  Support 
The  vendor  provides 

-  Inter-processor  communications 

-  Optimized  vector  libraries 

-  Other  basic  services 
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Three  Examples  Gedae 

Real-Time  Space-Time  Adaptive  Processing  (RT-STAP) 

-  Miter  benchmark  graph 

-  Illustrates  efficient  parallel  execution  of  large  graph 
Multilevel  Mode  Graph 

-  Illustrates  nested  mode  control  with  distributed  state 

-  Dynamic  data  application 
Sonar  Graph 

-  Illustrates  large  data  reduction  during  processing 

Each  example  illustrates  features  of  the  language, 
transformations,  and  virtual  machine 
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RT-STAP:  Language  Gedae 


Families  permit  replicating  box  and  data  elements 
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RT-STAP:  Language  Gedae 


Instantiation  constants  control  the  size  of  the  graph 
Routing  boxes  allow  equation  based  connectivity 
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RT-STAP:  Transformations  Gedae 


User  maps  primitives  to 
physical  processors 

Gedae  transforms  graph  by 
inserting  send/receive 
primitives  to  communicate 
between  partitions 

Gedae  automatically  creates 
executables  to  run  on  each 
processor 
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Different  mappings  can  be  tried  without  modifying  the 
graph  -  the  needed  transformation  happens  automatically 
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RT-STAP  Transformations 


Gedae 


•  User  can  set  transfer  properties  on  send/recv  pairs  with  Transfer 
Table 

•  Transformations  automatically  set  parameters  to  send/recv  pairs 
to  communicate  these  properties  to  running  application 
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User  can  guide  transformations  to  optimize  implementation 
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RT-STAP:  Running  on  VM 


Send/Recv  webs  show  interprocessor  communication 
and  uncover  synchronization  problems 
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RT-STAP:  Running  on  VM 
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Value 


embeddable/stream/fird  state 

int  granularity 

31296 

float  #in 

0x40760610:  ZERO.PTR 

float  *C 
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0x40063390:  -0.00161023... 

int  N Crvrs 
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int  N 
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Memory  Map 
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Mode  Control:  Language 
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Gedae 


•  Branch  boxes  make  mode 
changes  and  mark  segment 
boundaries 

•  “Exclusive”  branch  outputs 
show  where  resources  can 
be  shared 

•  State  shared  between 
modes  is  explicitly  declared 
in  the  graph 


The  Gedae  primitive  language 
directly  supports  segmented 
data  processing,  sharing  of 
resources,  and  distribution  of 
state 
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File  Application  Edit  View  Options 
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There  are  mi  mode  instances  of  the  mode.  Sequential  triggers  are  passed  to  different  instances  of 
Modelnputto  signal  the  input  box  to  receive  the  next  frame  of  data.  At  this  level  we  are  routing  the 
trigger  to  the  appropriate  instance.  Notice  that  there  is  a  seperate  input  box  for  each  mode  and  each 
instance.  The  exclusive  branching  of  the  triggers  insures  that  memory  will  be  shared  by  all  the  input 
boxes. 


—  test/Model*ISeg/ModeluiCtrlP  I  No  comment 
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Gedae 


Branch  box  copies  input  data  stream 
to  one  of  a  family  of  outputs  based  on 
a  control  stream.  Output  is: 

•Segmented  -  the  box  will  add 
segment  boundaries  to  the  output 

•  Dynamic  -  the  box  will  state  how 
much  data  is  produced  on  the  output 
at  runtime. 

•  Exclusive  -  only  one  of  the  family  of 
F  outputs  gets  data  on  any  firing. 

Allows  sharing  of  resources  and 
state. 

The  Gedae  extensible  language 
has  no  “built-in”  primitives. 

8000+  delivered  primitives. 

Users  can  add  custom  primitives 

Gedae,  Inc. 
www.gedae.ee 


Name :  cp_branchf_e 

Input:  stream  ControlParamRec  in; 

Input:  stream  int  c; 

Local:  int  last; 

Output:  exclusive  segmented  dynamic  stream 

ControlParamRec  [F]out; 

Reset:  {  last  =  -1;  } 

Apply :  { 

int  g,  i; 
int  prdc  =  0; 

for  (g=0;  g<granularity ;  g++)  { 

int  j  =  c [g] ; 
if  (last  ! =  j )  { 

if  (0<=last  &&  last<F)  { 
produce (out [last] , prdc) ; 
prdc  =  0 ; 

segment (out [last] , SEGMENT_END ) ; 

} 

last  =  j ; 

} 

if  ( 0<— j  &&  j <F)  { 

*out++  =  *in; 
prdc++ ; 

} 

in++  ; 

} 

produce (out [ last ] , prdc) ; 
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Mode  Graph:  Transformation 
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User  can  set  partitioning,  mapping,  data  transfer  methods, 
granularity,  priority,  queue  sizes  and  schedule  properties 
from  the  group  control  dialog 


www.gedae.com 


Mode  Graph:  Running  on  VM 


••  •• 

Gedae 


•  Each  mode  requires  a 
different  number  of 
processors 

•  Branch  boxes  at  one 
level  are  responsible  for 
the  dynamic  distribution 

VM  runtime  kernel  enforces 
dynamic  data  driven 
execution.  Send  and  receive 
primitives  and  state  transfer 
primitives  use  BSP  of  virtual 
machine  to  transfer  data 

Gedae,  Inc. 
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Mode  Graph:  Running  on  VM 


••  •• 

Gedae 


Primitives  to  send  and  receive 
state  are  automatically  added 
by  transformations 

Messages  generated  by 
Virtual  Machine  at  mode 
change  boundaries  efficiently 
coordinate  state  transfers 


|ra  ModeWSeg  Trace  Table 

File  View  Options  Stats  _r] 

Name 

0.379453  s 

0.477821  s 

+0.237 

II  III  |  |  |  |recuState 20 

Result  is  efficient  transparent  use  of  shared  state  on 
distributed  processing  system 

Gedae,  Inc. 
www.gedae.com 


Sonar:  Language  Gedae 


Sonar  Graph 
bandwidth 


creates  low  bandwidth  output  from  high 
input  data 


Gedae,  Inc. 
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Sonar:  Language 


Gedae 


Connectivity  +  Port  Descriptions 
gives  information  needed  to 
schedule  graph 

mx_vx  produces  R=120  tokens 
out  for  every  1  token  in 

vx_multV  box  must  fire  120  times 
for  each  firing  of  the  mx_vx  box. 

vx_fft  box  fires  one  time  for  each 
firing  of  vx_multV  box 

Simple  predetermined  schedule 
generated  from  graph  and  info 
embedded  in  primitives 


mx adjoin 

a 

out 

b 

mxux 


in  out 


Wj  R 


vx multV 

in 

out 

V 

X 

.fft 

in 

out 

inplace 'stream  complex  out[C](R)  =  in; 


Static  Schedule  Timeline 


mx_adjoin 
mx_vx 
vx_multV 
vx  fft 


1 


120 


Fire  at  granularity^ 
120,  1  time 


120 


Can  create  a  multirate  graph  that  has  boxes  firing  at 

different  granularities  Gedae,  Inc. 

www.gedae.com 


Sonar:  Transformation 


Gedae 


•  User  can  place  boxes  in  subschedules  to  strip-mine 
the  vector  processing 

•  Allows  use  of  fast  memory 

•  Can  reduce  memory  usage 


Static  Schedule  Timeline 


mx_adjoin 
mx_vx 
vxjmultV 
vx  fft 


120 


Fire  at  granularity^ 
120,  1  time 


120 


1 

1 

1 

■'F 

Subscheduled  Timeline 

Fire  at  granularity 
1,  120  times 


A 


Multirate  graphs  can  be  implemented  using  subscheduling 
to  improve  speed  and  reduce  memory  usage 

Gedae,  Inc. 
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Sonar:  Transformation 


Gedae 


Auto-Subscheduling  Too[ 


X  Group  1  Gain  Hier  Table 


File  Edit  View  Options 


Help 


Name 


TotalG 


Bytes 


Biv 


TDiv 


Bytes*G 


Boxes 


Subsched 


| Schedule  1 

1 

1440 

1 

1 

1 

1440 

6 

1 

2 

1966080 

2 

2 

1 

1966080 

2 

1 

2 

240 

20484 

30 

60 

4 

81936 

6 

1.1 

3 

1024 

4800 

32 

64 

16 

76800 

1 

1.2 

f4 

4096 

63628 

64 

4096 

1 

63628 

17 

fl.2.1 

Mutt 

Schedule  Information 
Dialog 


Group  1  Schedule  Parameters 


User  can  put  boxes  into  named 
subschedules  manually  -  but  can  be  difficult 


File  Edit  View 


/y.i :w 


Name 


Size 


Priority  Policy  Peric 


Auto-Subscheduling  Tool  puts  boxes  in 
subschedules  automatically 


Finds  nested  sets  of  connected  boxes 
running  at  common  granularities. 

Automatically  sets  subscheduling  levels 


Part  default 

Schedule  1 

1448 

0 

dataflow 

SubSched  1 

Segment  default 

2457600 

Segment  parent memory 

480 

SubSched  1.1 

Segment  default 

65536 

Segment  parent.memory 

32784 

SubSched  1*2 

Segment  default 

143040 

Segment  parent memory 

15360 

SubSched  1.2.1 

Segment  default 

61720 

Seoment  parent  memoru 

960 

Auto-subscheduling  has  reduced  memory  needed  by  graph 
from  250  Mbytes  to  about  2.5  Mbytes  -  lOOx  improvement 

Gedae,  Inc. 
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Sonar:  Running  on  VM 


••  •• 

Gedae 


Multiple  levels  of  subscheduling  evident  on  Trace  Table 
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Conclusion 


Gedae 


Gedae  Block  Diagram  Language  allows 
simple  expression  of  a  wide  range  of 
algorithms 

User  optimization  information  can  be  added 
without  modifying  block  diagram 

100+  transformations  create  efficient 
executable  application  from  language  and 
user  information 

Application  runs  efficiently  on  Virtual  Machine 
VM  provides  portability  and  visibility 
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Poster 


HPEC  2004 


Agenda 


Gedae:  Auto  Coding  to  a 
Virtual  Machine 

Authors:  William  I.  Lundgren, 
Kerry  B.  Barnes,  James  W.  Steed 
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What  is  Gedae? 


Gedae 


Gedae  is  a  block  diagram  language  ... 


Express  signal  and  data  processing  algorithms, 
parallelism,  load  balancing,  fault  tolerance  and  mode 
control. 


Gedae,  Inc. 
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What  is  Gedae? 


Gedae 

..that  Gedae  transforms  under  user  control... 


User  can  set  optimization  parameters  that  are 
independent  of  the  graph  to  guide  transformation 


Gedae,  Inc. 
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What  is  Gedae? 

Gedae 

...to  operate  efficiently  on  a  virtual  machine. 


Complete  systems  can  be  developed  independent  of 
the  target  system  without  losing  runtime  efficiency. 

Gedae,  Inc. 
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Gedae’s  Structure 


The  block  diagram  is  transformed 
using  over  100  algorithms. 

The  transformations  establish  the: 

-  Order  of  execution 

-  Queue  sizes 

-  Granularities 

-  Memory  layout 

-  Dynamic  schedule  parameters 

-  Data  transfer  types  and 
parameters 

-  Mode  control 


Functional 

Specification 

V~ 


Key 

User 

Gedae 

Vendor 

Gedae 

Implementation 

Specification 

T 


Transformations 


Detailed  Model 
- I - 


Generation 


Deployable  Application 
Virtual  Machine 
Heterogeneous  HW 


The  Gedae  transformations  build  a  detailed  model  of  the  deployed 
application.  Gedae  uses  that  information  to  provide  visibility. 
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